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Abstract. It is often useful to sort words into an order that reflects 
relations among their meanings as obtained by using a thesaurus. In this 
paper, we introduce a method of arranging words semantically by using 
several types of 'is-a' thesauri and a multi-dimensional thesaurus. We 
also describe three major applications where a meaning sort is useful 
and show the effectiveness of a meaning sort. Since there is no doubt 
that a word list in meaning-order is easier to use than a word list in 
some random order, a meaning sort, which can easily produce a word 
list in meaning-order, must be useful and effective. 

1 Using Msort 

Arranging words in an order that is based on their meanings is called a meaning 
sort (Msort). The Msort is a method of arranging words by their meanings rather 
than alphabetically. The method used to list the meanings is described in the 
next section. 

For example, suppose we obtain the following data in a research project:^] 



an event 



a temple, a formal style, an alma mater, to take up one's post, the 
Imperial Household, a campus, Japan, the Soviet Union, the whole 
country, an agricultural village, a prefecture, a school, a festival, 
the head of a school, an established custom, a government official, 
a celebration, a Royal family 

This is a list of noun phrases (NPs), each followed by the word gyoji (an 
event) in the form NP X no gyoji (an event of NP X) in Japanese. To find the 
most useful way to examine the list, we first arrange the NPs alphabetically: 



^ We actually obtained this data from the EDR co-occurrence dictionary |l| 



an agricultural village, an alma mater, a campus, a celebration, 
an established custom, a festival, a formal style, a government 
official, the head of a school, the Imperial Household, Japan, a 
prefecture, a Royal family, a school, the Soviet Union, to take up 
one's post, a temple, the whole country 

This list is not easy to use, so we next arrange the NPs by frequency of ap- 
pearance: 

an established custom, a school, a formal style, Japan, a prefec- 
ture, the whole country, a temple, an agricultural village, a Royal 
family, the Soviet Union, a festival, a campus, to take up one's 
post, a celebration, an alma mater, the Imperial Household, a 
government official, the head of a school 

Yet, even arranged this way, it is too difficult to use the list. 
We then use an Msort to arrange the NPs semantically, by using following 
categories: Human, Organization, and Action: 

(Human) the Imperial Household, a Royal family, a gov- 

ernment official, the head of a school 

(Organization) the whole country, an agricultural village, a 
prefecture, Japan, the Soviet Union, a temple, 
a school, a campus, an alma mater 

(Action) a celebration, an established custom, a formal 

style, to take up one's post, a festival 

This list is much easier to use than a listing in alphabetical or frequency 
order. Note that the words in each line are also arranged in an order that reflects 
relations among their meanings. For example, Japan and the Soviet Union are 
listed side by side, as are a school, a campus, and an alma mater. 

Although the list shows a variety of events, we can see at a glance that some 
are events related to certain special persons, and some are events related to a 
certain organization, and the others are miscellaneous forms of events. 

The Msort is also applicable to other situations as described in later sections. 
The Msort enables users to more easily and efficiently recognize and examine 
various types of problems. 

2 Implementing Msort 

To sort words in an order that reflects relations among their meanings, we first 
need to determine an order for the meanings. The Japanese thesaurus Bunrui 
Goi Hyou an 'is-a' hierarchical thesaurus, is useful for this. We refer to it 
as BGH. In BGH, each word has a category number. In the electronic version of 
BGH, each word has a 10-digit category number that indicates seven levels of 



Table 1. Modified BGH category numbers 



Semantic marker 


Original 


Modified 




code 


code 


Animal 


[f-3]56 


511 


Human 


12 [0-4] 


52 [0-4] 


Organization 


[1-3] 2 [5-8] 


53[5-8] 


Products 


[l-3]4[0-9] 


61 [0-9] 


Part of a living thing 


[l-3]57 


621 


Plant 


[1-3] 55 


631 


Nature 


[1-3] 52 


641 


Location 


[1-3] 17 


657 


Quantity 


[1-3] 19 


711 


Time 


[1-3] 16 


811 


Phenomenon 


[1-3]5[01] 


91[12] 


Abstract relation 


[1-3] 1 [0-58] 


aa[0-58] 


Human activity 


[l-3]58,[l-3]3[0-8] 


ab[0-9] 


Other 


4 


d 



the 'is-a' hierarchy. The top five levels are expressed by the first five digits, the 
sixth level is expressed by the next two digits, and the last level is expressed by 
the last three digits. (Although we have used BGH, Msort can also be used with 
other thesauri in other languages.) 

The easiest way of implementing Msort is to arrange words in order of their 
category numbers. However, only arranging words semantically does not produce 
a convenient result. If the items arranged are numbers, the order is clear, but 
there is no clear order for words. It is thus convenient to insert a mark, as a 
kind of bookmark, in certain places. We used semantic markers such as Human, 
Organization and Action as bookmarks. 

These markers were created by combining nominal semantic markers in the 
IPAL verbal dictionary [^ with the BGH classification system. Table |^ shows the 
modified category numbers obtained by integrating these new markers with the 
BGH codes. The first three digits of each category number have been changed. 
For example, the notation [l-3]56 and 511 in the first line means that when the 
first three digits of the category number are 156, 256, or 356, those digits will 
be changed to 511. ([1-3] means 1, 2, or 3.) 

The process of using an Msort is explained by applying it to the data set 
listed in Section |l|, obtained by the word gyoji (an event), as follows: 

1. Firstly, we give each word a new category number according to the trans- 
formation shown in Table |l|, to obtain the results shown in Table |](a). A 
temple occurs twice, and a formal style occurs four times. This indicates 
that both a temple and a formal style have multiple meanings. In the BGH 
thesaurus, a temple is defined as having two meanings, and a formal style 
is defined as having four meanings. 



Table 2. An example of the Msort process 



(a) Examples with BGH category numbers 



5363005022 


a temple 


7118007013 


the whole country 


5363005021 


a temple 


5353007012 


the whole country 


abl8207012 


a formal style 


5354006033 


an agricultural village 


ab21509016 


a formal style 


5355004017 


a prefecture 


aall011014 


a formal style 


5363010012 


a school 


ab70004013 


a formal style 


ab46002012 


a festival 


5363013015 


an alma mater 


5241023012 


the head of a school 


ab41201016 


to take up one's post 


abl8205021 


an established custom 


5210007021 


the Imperial Household 


5233004015 


a government official 


5363010015 


a campus 


5241101061 


a government official 


5359001012 


Japan 


abl4308013 


a celebration 


5359004192 


the Soviet Union 


ab46019012 


a celebration 


continued in 


the right-hand column 


5210007022 


a Royal family 



2. We then add semantic markers to the set of words in Table g(a) to get the 
results shown in Table ||(b) . 

3. Next, we arrange the items in Table §(b) in the order of their category 
numbers to get the results shown in Table ||(c) . 

4. Finally, we convert the data into a form that is easier to use. For example, 
when we delete the category numbers, redundant words with the same se- 
mantic marker in a line, and semantic markers to which no words correspond, 
we obtain the data shown in Table |4[ 

This data is much easier to use than the data shown in the other tables. 



3 Msort using different dictionaries 
3.1 Msort using a different 'is-a' thesaurus 

In Section || we described the implementation of an Msort using the BGH the- 
saurus. This is the most suitable 'is-a' thesaurus for an Msort because each word 
which contains is assigned a category number. This section examines whether 
an Msort can be used with an 'is-a' hierarchical thesaurus which has no category 
numbers, such as the EDR dictionary 

It is useful to consider the definition sentence of the concept in each node of 
an is-a thesaurus as the number of the level. If we do this, it is not necessary 
to create a new number. For example, the definitions of concepts from the top 
node to the node of the term "an alma mater" are as shown in Table ^j. 

When we do a meaning sort using the EDR dictionary, we only have to con- 
sider the connections of the hierarchy of meanings "concept: agent: autonomous 



Table 2. Example 



of the Msort process 



(b) Adding 


semantic markers 


(c) Arranging 


elements in the order 


for divisions 


of their category number 


5100000000 


(Animal) 


5100000000 


(Animal) 


5200000000 


(Human) 


5200000000 


(Human) 


5300000000 


(Organization) 


5210007021 


the Imperial Household 


6100000000 


(Product) 


5210007022 


a Royal family 


6200000000 


(Part of a living thing) 


5233004015 


a government official 


6300000000 


(Plant) 


5241023012 


the head of a school 


6400000000 


(Nature) 


5241101061 


a government official 


6500000000 


(Location) 


5300000000 


(Organization) 


7100000000 


(Quantity) 


5353007012 


the whole country 


8100000000 


(Time) 


5354006033 


an agricultural village 


9100000000 


(Phenomenon) 


5355004017 


a prefecture 


aaOOOOQOOO 


(Abstract relation) 


5359001012 


Japan 


abOOOOOOOO 


(Human activity) 


5359004192 


the Soviet Union 


dOOOOOOOOO 


(Other) 


5363005021 


a temple 


5363005022 


a temple 


5363005022 


a temple 


5363005021 


a temple 


5363010012 


a school 


abl8207012 


a formal stylo 


5363010015 


a campus 


ab21509016 


a formal style 


5363013015 


an alma mater 


aall011014 


a formal style 


6100000000 


(Product) 


ab70004013 


a formal style 


6200000000 


(Part of a living thing) 


5363013015 


an alma mater 


6300000000 


(Plant) 


ab41201016 


to take up one's post 


6400000000 


(Nature) 


5210007021 


the Imperial Household 


6500000000 


(Location) 


5363010015 


a campus 


7100000000 


(Quantity) 


5359001012 


Japan 


7118007013 


the whole country 


5359004192 


the Soviet Union 


8100000000 


(Time) 


7118007013 


the whole country 


9100000000 


(Phenomenon) 


5353007012 


the whole country 


aaOOOOOOOO 


(Abstract relation) 


5354006033 


an agricultural village 


aallOllOU 


a formal style 


5355004017 


a prefecture 


abOOOOOOOO 


(Human activity) 


5363010012 


a school 


abl4308013 


a celebration 


ab46002012 


a festival 


abl8205021 


an established custom 


5241023012 


the head of a school 


abl8207012 


a formal style 


abl8205021 


an established custom 


ab21509016 


a formal style 


5233004015 


a government official 


ab41201016 


to take up one's post 


5241101061 


a government official 


ab46002012 


a festival 


abl4308013 


a celebration 


ab46019012 


a celebration 


ab46019012 


a celebration 


ab70004013 


a formal style 


5210007022 


a Royal family 


dOOOOOOOOO 


(Other) 



Table 3. Definitions of concepts from the top node to the node of the term "an 
alma mater" 



concept 
agent 

autonomous being 
organization 

educational organization 

an organization to provide education, called a school 
a school at which a person was or is a student 



Table 4. Results of an Msort using the BGH thesaurus 



(Human) the Imperial Household, a Royal family, a government official, the 

head of a school 

(Organization) the whole country, an agricultural village, a prefecture, Japan, the 

Soviet Union, a temple, a school, a campus, an alma mater 
(Quantity) the whole country 
(Relation) a formal style 

(Action) a celebration, an established custom, a formal style, to take up one's 

post, a festival 



being: organization: educational organization: an organization to provide edu- 
cation, called a school: a school at which a person was or is a student" as the 
category number. 

Some results of a meaning sort using the EDR dictionary are shown in Table 
. We used the first three definition terms as division markers. 
The above analysis demonstrates that a meaning sort can be done using any 
is-a thesaurus. However, there is a problem in that the order of the branching- 
point nodes of a hierarchical structure is ambiguous. In the case shown in Table 
H, the order is the alphabetical order of the strings in the definition terms. It is 
better to specify the order manually, but if this is too difficult, it is better to do 
a meaning sort of the definition terms themselves by using another dictionary 
or thesaurus, e.g. the BGH thesaurus. 



3.2 Msort using a dictionary where each word is expressed with a 
set of multiple features 

In some dictionaries, each word is expressed with a set of multiple features [5| 
[f2|. For example, the research of the IPAL Japanese generative dictionary [7| 



^ This table was obtained by using a Japanese dictionary. In the table, "a temple" 
and "a prefecture" belong to the category "human being." In Japanese, "a temple" 
and "a prefecture" have many meanings, including "human being." 



Table 5. Results of an Msort using the EDR dictionary 



(concept 


agent 


autonomous being) 


a school, a campus, an alma mater, a tem- 
ple, a prefecture, the Soviet Union, Japan, 
a Royal family, the Imperial Household, 
the head of a school, a government official 


(concept 


agent 


human being) 


a temple, a prefecture, the head of a school, 
a government offlcial 


(concept 


event 


action) 


a celebration, to take up one's post 


(concept 


event 


phenomenon) 


a festival 


(concept 


matter 


event) 


a festival, an established custom, a celebra- 
tion 


(concept 


matter 


thing) 


a temple, a school, a prefecture, the head 
of a school, a government official, a cele- 
bration, a formal style 


(concept 


space 


location) 


a temple, a school, the whole country, a 
prefecture, an agricultural village, the So- 
viet Union, Japan 



Table 6. Example of a dictionary in which each word is assigned multiple fea- 
tures 



Word 


Feature 


Style Object Depth Size Material 


utsuwa (a container) 
wanl (a ceramic bowl) 
wan2 (a wooden bowl) 
yunomi (a Japanese teacup) 
sara (a plate) 


Oriental — deep — ceramic 
Oriental — deep — wooden 
Oriental Japanese tea deep — ceramic 
— — shallow — — 



gives multiple features to various words having the meaning of the containers in 
Table ^. In this table, " — " means that the feature value is not specified. 

It is possible to do an Msort in the case of such a dictionary. We have only to 
treat the information as if each feature is equivalent to a level in an imaginary 
'is-a' thesaurus. In Table ||, if we assume that the features, from left to right, 
correspond to the levels, from top to bottom, of an imaginary thesaurus, the 
levels become Style, Object, Depth, Size, and Material, and a category number 
represents Style:Object:Depth:Size:Material, which is essentially the same situa- 
tion as for the EDR data. For example, the category number of wan2 {a wooden 
bowl) is Oriental: — ." deep: — ; wooden. (Actually in order to do an Msort of fea- 
ture values, we may change Oriental, deep, and wooden into the corresponding 
category numbers in BGH.) We simply do an Msort, assuming that each word 
has such a category word. The result of this Msort is shown in Table |^. 

Table ^ shows the result of an Msort based on the supposition that the 
leftmost feature is the most important. Which feature is most important is, in 



Table 7. Result of an Msort, from the leftmost feature 



Word 


Feature 


Style Object Depth Size Material 


utsuwa (a container) 
sara (a plate) 
wanl (a ceramic bowl) 
wan2 (a wooden bowl) 
yunomi (a Japanese teacup) 


— — shallow — — 
Oriental — deep — ceramic 
Oriental — deep — wooden 
Oriental Japanese tea deep — ceramic 


Table 8. The result of an Msort, from the rightmost feature 


Word 


Feature 


Style Object Depth Size Material 


utsuwa (a container) 
sara (a plate) 
wanl (a ceramic bowl) 
yunomi (a Japanese teacup) 
wan2 (a wooden bowl) 


— — shallow — — 
Oriental — deep — ceramic 
Oriental Japanese tea deep — ceramic 
Oriental — deep — wooden 



fact, not clear. For example, if we suppose that the rightmost feature is the most 
important and we do an Msort from that feature, we get a different result, as 
shown in Table |^. From a dictionary with multiple features, we can get various 
results of Msorts in this wasy, by changing the features which are thought to be 
most important. This means that users can do an Msort in any order of features 
that they may be interested in. This kind of dictionary, that is, the kind which 
provides multiple features, is therefore very flexible. 

When a hierarchical thesaurus is used to examine this, there are further 
interesting results. We can assume that each feature corresponds to a level of the 
hierarchical thesaurus, so we can construct many kinds of hierarchical thesauri 
by changing the correspondence between levels and features. For example, we 
can construct the hierarchical thesaurus shown in Figure [| from the result of 
an Msort from the leftmost feature as shown in Table |7[ We can construct a 
hierarchical thesaurus shown in Figure ^ from the result of an Msort from the 
rightmost feature as shown in Table ^. In the thesaurus of Figure 0, we can see 
the semanitical similarity between wanl (a ceramic bowl) and wan2 (a wooden 
bowl). In the thesaurus of Figure ^, we can understand that wanl {a ceramic 
bowl) and wan2 (a Japanese teacup) are semantically similar in that they are both 
ceramic. Such construction of multiple thesauri has led to further research into 
a multi-dimensional thesaurus. The necessity for a multi-dimensional thesaurus 
was discussed in Kawamura's paper Kawamura's paper argued that if we 
divide a bird and an airplane into other categories at a relatively higher level 
of a hierarchy than the level at which entries are divided according to whether 



utsuwa 
(a container) 




wanl wan2 yunomi 

(a ceramic bowl) (a wooden bowl) (a Japanese teacup) 



Fig. 1. Hierarchical thesaurus of meaning sort 
from the leftmost feature 



the item can fly or not, we will not be able to see that a bird and an airplane 
are semantically similar in that they can both fly. Therefore a dictionary with 
multiple features, which can be flexibly reconfigured into hierarchical thesauri of 
many kinds, would be very useful, and the construction of such a dictionary is 
necessary for reasons of practicality. Also, we have our doubts as to whether it 
is necessary to make a word dictionary in the form of a hierarchical thesaurus. 
Looking at Table ^, because all the features of utsuwa (a container) are " — " 
representing no specification of feature values, we are able to see that utsuwa (a 
container) is super-ordinate to the other words by looking at the information on 
the multiple features. We can estimate super-ordinate and subordinate relation 
from the inclusion relationships of features, so construction of a hierarchical 
thesaurus as such is not necessary. A dictionary with multiple features is all 
that is necessary. Furthermore, a dictionary with multiple features has a further 
advantage in that we can define the similarity of two words in terms of the 
proportion of features that are the same for both words. Although a high-order 
predicative logic and a natural language sentence can be thought of as the true 
semantic descriptions of words, we think that a dictionary using multiple features 
would be useful in that it can be handled by existing natural language processing 
techniques, and can handle various multi-dimensional thesauri. 

If such a dictionary is constructed, it would be convenient for meaning sort, 
since it would allow users to do interest-based meaning sort. 

4 Three examples of using an Msort 

In this section, we describe three major applications for which an Msort is useful. 



utsuwa 
(a container) 




wanl yunomi wan2 

(a ceramic bowl) (a Japanese teacup) (a wooden bowl) 



Fig. 2. Hierarchical thesaurus of meaning sort 
from the rightmost feature 



4.1 Dictionary construction 

Table H shows the construction of a case frame for the verb eat according to data 
in a noun-verb relational dictionary as an example. The table shows the results 
of an Msort of NPs which may be taken as case elements of eat. It is easy to 
manually construct a case-frame dictionary from such data, as shown in Table ^ 
The nominative case of eat consists of agents, such as animals and people, and 
the objective case consists of various NPs mainly meaning foods. Regarding the 
optional case, various phrases such as by myself, in an office, and in a meeting 
are also included. 

The construction of a verbal case-frame dictionary is one example of the 
potential applications of an Msort. A similar construction process can also be 
easily applied to copulas and other kinds of relationships among words. An Msort 
is not only useful for constructing dictionaries, but also for examining data and 
extracting important information in language investigation. An Msort is also 
useful for examining data in the process of knowledge acquisition. 



4.2 Tagged corpus construction (related to semantic similarity) 

Recently, various corpora have been under construction and the in- 

vestigation of corpus-based learning algorithms is attracting much attention |^ . 
In this section, we demonstrate how an Msort can be useful in the construction 
of corpora. 

Suppose that we want to disambiguate the meanings of o/ in N P X o/ N P Y 
by using the example-based method ||^. In this case, we need a tagged corpus 



Table 9. Example construction of a case frame of the verb eat 



(a) Results of an Msort of terms in the nominative case 



(Animal) cattle, a calf, fish 

(Human) we, us, all, myself, babies, a parent, a sister, a customer, a 
Japanese, a nurse, a writer 



(h) Results of ail Msort of tonus in tlio (jl)jo(;tivo ca.so 



(Animal) an animal, shellfish, plankton 

(Product) prey, a product, a material, food, feed, Japanese food, Japanese- 
style food. Western food, Chinese food, a rice ball, gruel, sushi, 
Chinese noodles, macaroni, sandwiches, a pizza, a steak, a barbe- 
cued dish, tempura, fried food, cereals, rice, white rice, Japanese 
rice, barley, kimchi, sugar, jam, a confection, a cake, a cookie, 
ice cream 

(Body part) the mortal remains, the liver 

(Plant) a gene, a plant, grass, a sweet pepper, chicory, a mulberry, a 

banana, a matsutake mushroom, kombu 
(Phenomenon) a delicate flavor, snow 
(Relation) the content 
(Activity) breakfast, lunch, dinner, supper 



(c) Results of an Msort of terms in the optional cases 

(In Japanese, "in", "on", and "by" are expressed by 
the same word, so, we cannot divide data according to "in" or "by") 



(Human) (by) myself 

(Organization) (in) an office, (in) a restaurant, (in) a hotel 
(Product) (by) soy sauce, (in) a dressing room, (in) bed, (on) a table 
(Location) (on) the spot, (in) the whole area, (on) a train 
(Quantity) (by) two persons, (at) a rate, (by) many people 
(Activity) (at) work, (in) a meeting 



Table 10. Construction of a manually tagged corpus for the semantic analysis 
of noun phrases in "NP X of NP Y" 



NP X 


NP Y 


Semantic IR,elation 


Q Tl O flpQ 1 r 

OjII dii-Clll 


J. ClllCllllCL 




an affair 


a junior higli school 


Location 


an affair 


an army 


Location 


an affair 


an album 


Indirect-determiner 


an affair 


a tanker 


Indirect-determiner 


an affair 


the worst 


Adjective-feature 


an affair 


the largest 


Adjective- feature 


a property 


the circumference 


Location 


items 


both countries 


Object-agent 


items 


documentary records 


Field-determiner 


items 


a general meeting 


Object-agent 


a provision 


the Upper House 


Field-determiner 


a provision 


a new law 


Field-determiner 


a provision 


a treaty 


Field-determiner 


a provision 


an agreement 


Field-determiner 



for semantic analysis of the noun phrases in NP X o/NP Y. We attach semantic 
relationships such as Part-of and Location to each example of the noun phrases. 
When we do an Msort of these phrases, similar examples are grouped together 
and the tagging of semantic relationships by hand is made easier. 

Table |lO| shows part of a manually tagged corpus. In this example we have 
supposed that NPXinNPXo/NPY will be the more important NP, so we 
first did an Msort of NP X, and then did one of NP Y. Although the technical 
terms representing the semantic relationships in the table are specialized, it can 
be seen that the examples which are grouped together by this Msort often have 
the same relationship. Also, when semantically similar examples are grouped 
together like this, the cost of tagging is decreased. 

In the example-based method, the tag attached to the example that is the 
most similar to the input phrase is judged to be the result of the analysis. An 
Msort performs the function of grouping similar examples. The example-based 
method and the Msort both use word similarity, and this is an advantage of both 
techniques. 

In this section, we noted that using the Msort is an efficient way to construct 
a noun-phrase corpus. In addition, when a certain corpus uses words, we can 
also use an Msort for the construct of it. 

4.3 Information retrieval 

Information retrieval activity has increased with the growth of the Internet. An 
Msort can also easily be applied to this area. 

For example, in research conducted by Tsuda and Senda, the features of 
a document database were displayed to users by using multiple keywords jl^ . 



For example, assume that the document database we want to display has the 
following set of keywords. 



retrieval, a word, a document, construction, candi- 
date, a number, a keyword 



Displaying the list of words in a random order is not very convenient for 
users. However, if we do an Msort of the keywords, we can obtain the following 
list: 

(Quantity) a number 

(Abstract relation) candidate 
(Human activity) retrieval 

a document, a keyword, a word, 

construction 

(Here, we have displayed words with the same first three-digit BGH category 
number on the same line.) This method provides a more useful perspective for 
users. 

In some cases we may display many keywords and ask the users to select the 
appropriate ones [|3) . In such a case, if we do not have another way of arranging 
the words in an appropriate order, it is convenient for users if we use an Msort. 

5 Conclusion 

In summary, we have introduced a useful method of arranging words semanti- 
cally and shown how to implement it by using thesauri. We gave three major 
examples of the applications of an Msort (dictionary construction, tagged corpus 
construction, and information presentation). 

Since there is no doubt that a word list in a meaning-order is easier to use 
than a word list in a random-order, the Msort, which can easily produce a word 
list in a meaning-order, must be useful and effective. 

The Msort is a very useful tool for natural language processing, and NLP 
research can be made more efficient by applying it. 
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